Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete resources when a new working copy is cancelled #8460

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

tylerjmchugh
Copy link
Contributor

When a newly created working copy is cancelled the working copy is deleted. The associated resources however are not deleted resulting in orphaned files.

This PR aims to fix this issue by also deleting the associated resources when a new working copy is deleted using the "cancel" button in the editor. This functionality is already present when using the "cancel working copy" button on the record view.

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

@tylerjmchugh tylerjmchugh marked this pull request as ready for review October 23, 2024 17:16
@ianwallen ianwallen requested a review from josegar74 October 23, 2024 19:52
@ianwallen ianwallen added the bug label Oct 23, 2024
Copy link
Member

@josegar74 josegar74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cancelling the working copy works fine, but with this other use case, which also calls the same method I am having some problems, sometimes files are deleted, but sometimes not.

  1. Approve a metadata
  2. Edit it and click the Cancel button in the metadata editor.

Can you check at your end?

I also see some error related to Elasticsearch, but I'm not sure yet, if related to this change or something else:

2024-10-29T15:12:47,663 ERROR [geonetwork.datamanager] - Couldn't cleanup draft 124
org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/gn-records/_delete_by_query?q=%2Bid%3A124&refresh=true], status line [HTTP/1.1 409 Conflict]
{"took":1,"timed_out":false,"total":1,"deleted":0,"batches":1,"version_conflicts":1,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"gn-records","id":"0e8238e7-22c6-4ba8-a1f2-fb53c04b5aac-draft","cause":{"type":"version_conflict_engine_exception","reason":"[0e8238e7-22c6-4ba8-a1f2-fb53c04b5aac-draft]: version conflict, required seqNo [34], primary term [1]. current document has seqNo [36] and primary term [1]","index_uuid":"7n67lyIxRLWp99agH0HzcA","shard":"0","index":"gn-records"},"status":409}]}

@tylerjmchugh
Copy link
Contributor Author

tylerjmchugh commented Oct 29, 2024

Cancelling the working copy works fine, but with this other use case, which also calls the same method I am having some problems, sometimes files are deleted, but sometimes not.

  1. Approve a metadata
  2. Edit it and click the Cancel button in the metadata editor.

Can you check at your end?

I also see some error related to Elasticsearch, but I'm not sure yet, if related to this change or something else:

2024-10-29T15:12:47,663 ERROR [geonetwork.datamanager] - Couldn't cleanup draft 124
org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/gn-records/_delete_by_query?q=%2Bid%3A124&refresh=true], status line [HTTP/1.1 409 Conflict]
{"took":1,"timed_out":false,"total":1,"deleted":0,"batches":1,"version_conflicts":1,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"gn-records","id":"0e8238e7-22c6-4ba8-a1f2-fb53c04b5aac-draft","cause":{"type":"version_conflict_engine_exception","reason":"[0e8238e7-22c6-4ba8-a1f2-fb53c04b5aac-draft]: version conflict, required seqNo [34], primary term [1]. current document has seqNo [36] and primary term [1]","index_uuid":"7n67lyIxRLWp99agH0HzcA","shard":"0","index":"gn-records"},"status":409}]}

@josegar74 An issue I just found while attempting to test this with iso19139 is that during the draft creation process (DraftMetadataUtils.createDraft) the approved metadata id is returned instead of the new draft id. I'm wondering if this is related to the issue you have. When testing with iso19139.ca.HNAP this issue does not occur and the draft id is returned.

It seems that within DraftMetadataUtils.createDraft the repository needs to be flushed after metadataManager.insertMetadata and before metadataOperations.deleteMetadataOper as metadataOperations.deleteMetadataOper clears the persistence context. But from the documentation it seems that flush is deprecated and not recommended and I'm not sure if @Modifying(clearAutomatically=true) is required on OperationAllowedRepository.deleteAllByMetadataId.

@ianwallen ianwallen added this to the 4.4.7 milestone Oct 31, 2024
@josegar74
Copy link
Member

@tylerjmchugh thanks for the feedback, DraftMetadataUtils.createDraft afaik should be returning the draft id. I'll try the options you describe and get back to you.

I am also not sure why ISO HNAP seems to work fine and ISO19139 does not.

@CLAassistant
Copy link

CLAassistant commented Dec 8, 2024

CLA assistant check
All committers have signed the CLA.

@josegar74
Copy link
Member

josegar74 commented Dec 23, 2024

@tylerjmchugh I've done additional tests about this case:

  1. Approve a metadata
  2. Edit it and click the Cancel button in the metadata editor.

and found that for iso19139, the folder is not deleted when the files are used for metadata thumbnails. When they are used only as online resources, I could not reproduce the problem.

Can you check at your end with HNAP?


When it fails, it caused by the delete in Elasticsearch, the internal identifier that is deleted is fine, but throws this error:

2024-12-23T11:59:17,741 ERROR [geonetwork.datamanager] - Couldn't cleanup draft 162
org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/gn-records/_delete_by_query?q=%2Bid%3A162&refresh=true], status line [HTTP/1.1 409 Conflict]
{"took":3,"timed_out":false,"total":1,"deleted":0,"batches":1,"version_conflicts":1,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[{"index":"gn-records","id":"9635ae76-fbad-4b6d-a6c4-28093b6ebc52-draft","cause":{"type":"version_conflict_engine_exception","reason":"[9635ae76-fbad-4b6d-a6c4-28093b6ebc52-draft]: version conflict, required seqNo [351], primary term [1]. current document has seqNo [353] and primary term [1]","index_uuid":"Jplduz3MRwymrr0VBqGPEA","shard":"0","index":"gn-records"},"status":409}]}

According to the docs: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

If a document changes between the time that the snapshot is taken and the delete operation is processed, it results in a version conflict and the delete operation fails.
...
You can opt to count version conflicts instead of halting and returning by setting conflicts to proceed.

It seems some update is done to the index during the working copy cancelation is causing this issue.

@josegar74
Copy link
Member

I've tried these 2 changes:

    searchManager.forceIndexChanges();
    searchManager.delete(String.format("+id:%s", id));

This doesn't solve the problem

  1. In
    DeleteByQueryRequest request = DeleteByQueryRequest.of(
    b -> b.index(new ArrayList<>(Arrays.asList(index)))
    .q(query)
    .refresh(true));
        DeleteByQueryRequest request = DeleteByQueryRequest.of(
            b -> b.index(new ArrayList<>(Arrays.asList(index)))
                .q(query)
                .conflicts(Conflicts.Proceed)
                .refresh(true));

This solves the problem.

I think it can be quite safe to use it. But would like a second opinion from @fxprunayre as this code is used in other places. Also if you know what can make the difference when cancelling the editing of a working copy without thumbnails and with thumbnails to cause the error indicated in the previous comment? Can be related to https://github.com/geonetwork/core-geonetwork/blob/main/core/src/main/java/org/fao/geonet/kernel/search/index/OverviewIndexFieldUpdater.java?

@tylerjmchugh
Copy link
Contributor Author

@tylerjmchugh I've done additional tests about this case:

  1. Approve a metadata
  2. Edit it and click the Cancel button in the metadata editor.

and found that for iso19139, the folder is not deleted when the files are used for metadata thumbnails. When they are used only as online resources, I could not reproduce the problem.

Can you check at your end with HNAP?

@josegar74 I have retested and I can confirm that with the changes introduced in this PR the folders are deleted except for thumbnails with iso19139.

I also tested HNAP again but even with resources used for thumbnails the elasticsearch error does not occur and the folder is deleted successfully.

I'm not sure why there is a difference between the two schemas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants